set.seed(123)
n.sample <- 200
x1 <- rbinom(n.sample, size = 1, prob = 0.5)
x2 <- runif(n.sample, -5, 5)
a <- 5
b1 <- 3
b2 <- 4
b3 <- -3
e <- rnorm(n.sample, 0, 5)
y <- a + b1 * x1 + b2 * x2 + b3 * x1 * x2 + e
sim.dat <- data.frame(y, x1, x2)Data Partitioning and Feature Engineering
1 Interaction terms in regression
Interaction terms enter the regression equation as the product of two constitutive terms, x_1 and x_2. For this product term x_1x_3, the regression equation adds a separate coefficient \beta_{3}.
y=\beta_{0}+\beta_{1}x_1+\beta_{2}x_2+\beta_{3}x_1x_2+\epsilon
2 Example 1: one binary, one continuous term
In thinking about interaction terms, it helps to first simplify by working through the prediction of the regression equation for different values of two predictors, x_1 and x_2. We can imagine a continuous outcome y, e.g. the income of Hollywood actors, that we predict with two variables. The first, x_1, is a binary variable such as female gender; it takes on values of 0 (males) and 1 (females) only. The second, x_2, is a continuous variable, ranging from −5 to +5, such as a (centered and standardized) measure of age. In this case, I’m using the term “effect” loosely and non-causally. An interaction term expresses the idea that the effect of one variable depends on the value of the other variable. With these variables, this suggests that effect of age on actors’ income is different for male actors than for female actors.
- \beta_{1} is the effect of x_1 on y when x_2 is 0:
- \hat{y}=\beta_{0}+\beta_{1}x_1+\beta_{2} \times 0+\beta_{3}x_1 \times 0
- \hat{y}=\beta_{0}+\beta_{1}x_1+0+0
- \beta_{2} is the effect of x_2 on y when x_1 is 0:
- \hat{y}=\beta_{0}+\beta_{1} \times 0+\beta_{2}x_2+\beta_{3} \times 0 \times x_2
- \hat{y}=\beta_{0}+\beta_{2}x_2+0
- When both x_1 and x_2 are not 0, \beta_{3} becomes important, and the effect of x_1 now varies with the value of x_2. We can plug in 1 for x_1 and simplify the equation as follows:
- \hat{y}=\beta_{0}+\beta_{1} \times 1+\beta_{2}x_2+\beta_{3} \times 1 \times x_2
- \hat{y}=\beta_{0}+\beta_{1}+\beta_{2}x_2+\beta_{3} \times x_2
- \hat{y}=(\beta_{0}+\beta_{1})+(\beta_{2}+\beta_{3}) \times x_2
With simulated data, this can be illustrated easily. I begin by simulating a dataset with 200 observations, two predictors x_1 (binary: male/female) and x_2 (continuous: standardized and centered age), and create y (continuous: income) as the linear combination of x_1, x_2, and an interaction term of the two predictors.
Before advancing, this is what the simulated data look like:
par(mfrow = c(1, 3))
hist(sim.dat$y, main="Histogram of Y", breaks=20, col="#336699")
hist(sim.dat$x1, main="Histogram of X1", breaks=20, col="#336699")
hist(sim.dat$x2, main="Histogram of X2", breaks=20, col="#336699")par(mfrow = c(1, 1))For convenience, you could also use the multi.hist() function from the “psych” package. It automatically adds a density curve (dashed line) and a normal density plot (dotted line).
library(psych)
multi.hist(sim.dat, nrow = 1, bcol="#336699",breaks=15)mod.sim <- lm(y ~ x1 * x2, dat = sim.dat)
stargazer::stargazer(mod.sim, type='html', summary=TRUE,report = "vc*stp",ci=TRUE)| Dependent variable: | |
| y | |
| x1 | 3.872*** |
| (2.484, 5.260) | |
| t = 5.467 | |
| p = 0.00000 | |
| x2 | 3.955*** |
| (3.630, 4.281) | |
| t = 23.790 | |
| p = 0.000 | |
| x1:x2 | -2.944*** |
| (-3.418, -2.469) | |
| t = -12.160 | |
| p = 0.000 | |
| Constant | 4.789*** |
| (3.824, 5.754) | |
| t = 9.726 | |
| p = 0.000 | |
| Observations | 200 |
| R2 | 0.762 |
| Adjusted R2 | 0.758 |
| Residual Std. Error | 4.997 (df = 196) |
| F Statistic | 208.613*** (df = 3; 196) |
| Note: | p<0.1; p<0.05; p<0.01 |
First, I plot the relationship between the continuous predictor x_2 and y for all observations where x_1=0. The regression line is defined by an intercept \alpha and a slope \beta_2, as we calculated above. I can extract these from the lm object above using the coef() function. I use the index in square brackets to extract the coefficient I need.
plot(x = sim.dat[sim.dat$x1 == 0, ]$x2, y = sim.dat[sim.dat$x1 == 0, ]$y,
col = rgb(red = 0, green = 0, blue = 1, alpha = 0.25), pch = 19,
xlab = "x2", ylab = "y")
abline(a = coef(mod.sim)[1], b = coef(mod.sim)[3], col = "blue", pch = 19, lwd = 2)For all cases where x_{1}=0, what is the relationship between x2 and y? Next, we can add the data points where x_{1}=1, and add the separate regression line for these points:
plot(x = sim.dat[sim.dat$x1 == 0, ]$x2, y = sim.dat[sim.dat$x1 == 0, ]$y,
pch = 19, xlab = "x2", ylab = "y", col = rgb(red = 0, green = 0, blue = 1, alpha = 0.25))
abline(a = coef(mod.sim)[1], b = coef(mod.sim)[3], col = "blue", lwd = 2)
points(x = sim.dat[sim.dat$x1 == 1, ]$x2, y = sim.dat[sim.dat$x1 == 1, ]$y,
col = rgb(red = 1, green = 0, blue = 0, alpha = 0.25), pch = 19)
abline(a = coef(mod.sim)[1] + coef(mod.sim)[2], b = coef(mod.sim)[3] + coef(mod.sim)[4],
col = "red", lwd = 2)For all cases where x_{1}=1, what is the relationship between x2 and y?
2.1 Specifying an interaction
- Although the tutorial does not discuss this, in the prevalent case of using null hypothesis significance tests on interaction effects, interaction terms should be derived from theory.
- In R, you specify an interaction term by putting an asterisk between the two constitutive terms: x_1 \times x_2
See Brambor et al. (2006) for more details. What would it imply if your model only included \beta_{3}x_{1}x_{2}? Omitting a coefficient is equivalent to setting \beta_1=0 and \beta_2=0. See for yourself: R will include the main effects regardless of specifications if an interaction term is present.
mod.sim2 <- lm(y ~ x1:x2, data = sim.dat)
stargazer::stargazer(mod.sim2, type='html', summary=TRUE,report = "vc*stp",ci=TRUE)| Dependent variable: | |
| y | |
| x1:x2 | 0.959*** |
| (0.270, 1.647) | |
| t = 2.729 | |
| p = 0.007 | |
| Constant | 6.656*** |
| (5.268, 8.043) | |
| t = 9.404 | |
| p = 0.000 | |
| Observations | 200 |
| R2 | 0.036 |
| Adjusted R2 | 0.031 |
| Residual Std. Error | 9.995 (df = 198) |
| F Statistic | 7.448*** (df = 1; 198) |
| Note: | p<0.1; p<0.05; p<0.01 |
plot(x = sim.dat[sim.dat$x1 == 0, ]$x2, y = sim.dat[sim.dat$x1 == 0, ]$y,
pch = 19, col = rgb(red = 0, green = 0, blue = 1, alpha = 0.25),
xlab = "x2", ylab = "y")
points(x = sim.dat[sim.dat$x1 == 1, ]$x2, y = sim.dat[sim.dat$x1 == 1, ]$y,
col = rgb(red = 1, green = 0, blue = 0, alpha = 0.25), pch = 19)
abline(a = coef(mod.sim2)[1], b = coef(mod.sim2)[2], lwd = 2)3 Example 2: Two continuous variables
The first example illustrates the general logic of interaction terms. It carries over directly into the case of an interaction term between two continuous variables. Let’s now simulate another (fake) dataset, with a continuous outcome y being the income of professional baseball players. We can assume that x_{1} is a continuous variable and distributed normally with a mean of 2 and a standard deviation of 5 (e.g., a standardized measure of injury risk of each player). As previously, x_{2} is a continuous variable (such as standardized & centered age), ranging from −5 to +5. We might be thinking of a conditional effect, where older athletes make more — but only if they are rated as low injury risk. Older athletes with higher injury risk might be making less money. With this configuration,
- \beta_{1} is the effect of x_{1} on y when x_{2} is 0:
- \hat{y}=\beta_{0}+\beta_{1}x_{1}+\beta_{2} \times 0+\beta_{3}x_{1} \times 0
- \hat{y}=\beta_{0}+\beta_{1}x_{1}+0+0
- \beta_{2} is the effect of x_{2} on y when x_{1} is 0:
- \hat{y}=\beta_{0}+\beta_{1} \times 0+\beta_{2}x_{2}+\beta_{3} \times 0 \times x_{2}
- \hat{y}=\beta_{0}+\beta_{2}x_{2}+0
- When both x_{1} and x_{2} are not 0, \beta_{3} becomes important, and the effect of x_{1} now varies with the value of x_{2}. We can again plug in 1 for x_{1} and simplify the equation as follows: 1.\hat{y}=\beta_{0}+\beta_{1} \times 1+\beta_{2}x_{2}+\beta_{3} \times 1 \times x_{2}
- \hat{y}=\beta_{0}+\beta_{1}+\beta_{2}x_{2}+\beta_{3} \times x_{2}
- \hat{y}=(\beta_{0}+\beta_{1})+(\beta_{2}+\beta_{3}) \times x_{2}
- But x_{1} takes on many other values than just 1. The more general equation for y is then (cf. Brambor et al. p. 11)
- y=\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\beta_{3}x_{1}x_{2}
- In this case, we cannot define just “two” equations (for x_{1}=1 and x_{1}=0). Instead, the effect of x_{1} varies with x_{2}, and vice versa.
- To express this, we use the term “marginal effects” or “conditional effects”: the effect of x_{1} conditional on the value of x_{2}.
- This marginal effect for x_{1} is a slope, and this slope is the sum of the coefficient for x_{1} and the product of \beta_{3} and the value of x_{2}: \beta_{1}+\beta_{3} \times x_{2}.
- Similarly, the marginal effect for x_{2} is \beta_{2}+\beta_{3} \times x_{1}.
I use the same simulation setup above, but make x_{1} a continuous variable, normally distributed with a mean of 2 and a standard deviation of 5.
set.seed(123)
n.sample <- 200
x1 <- rnorm(n.sample, mean = 2, sd = 5)
x2 <- runif(n.sample, -5, 5)
a <- 5
b1 <- 3
b2 <- 4
b3 <- -3
e <- rnorm(n.sample, 0, 20)
y <- a + b1 * x1 + b2 * x2 + b3 * x1 * x2 + e
sim.dat2 <- data.frame(y, x1, x2)Before advancing, this is what the simulated data look like:
multi.hist(sim.dat2, nrow = 1, breaks=15)Fitting a model using what we know about the data-generating process gives us:
mod.sim3 <- lm(y ~ x1 * x2, dat = sim.dat2)
stargazer::stargazer(mod.sim3, type='html', summary=TRUE,report = "vc*stp",ci=TRUE)| Dependent variable: | |
| y | |
| x1 | 2.592*** |
| (1.993, 3.192) | |
| t = 8.475 | |
| p = 0.000 | |
| x2 | 4.103*** |
| (3.059, 5.146) | |
| t = 7.704 | |
| p = 0.000 | |
| x1:x2 | -3.038*** |
| (-3.238, -2.839) | |
| t = -29.883 | |
| p = 0.000 | |
| Constant | 6.495*** |
| (3.441, 9.549) | |
| t = 4.168 | |
| p = 0.00005 | |
| Observations | 200 |
| R2 | 0.835 |
| Adjusted R2 | 0.832 |
| Residual Std. Error | 20.343 (df = 196) |
| F Statistic | 330.285*** (df = 3; 196) |
| Note: | p<0.1; p<0.05; p<0.01 |
We can now calculate the effect of x_{1} on y at different levels of x_{2}, say, across the range of x_{2} with increments of 1. In R, I create this sequence using the seq() function.
x2.sim <- seq(from = -5, to = 5, by = 1)
x2.sim [1] -5 -4 -3 -2 -1 0 1 2 3 4 5
eff.x1 <- coef(mod.sim3)[2] + coef(mod.sim3)[4] * x2.simI can now list the effect of x1 at different levels of x_{2}:
eff.dat <- data.frame(x2.sim, eff.x1)
eff.dat %>% kbl() %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "center")| x2.sim | eff.x1 |
|---|---|
| -5 | 17.7837936 |
| -4 | 14.7454971 |
| -3 | 11.7072007 |
| -2 | 8.6689042 |
| -1 | 5.6306077 |
| 0 | 2.5923113 |
| 1 | -0.4459852 |
| 2 | -3.4842817 |
| 3 | -6.5225781 |
| 4 | -9.5608746 |
| 5 | -12.5991711 |
The object eff.x1 is now the coefficient of x_{1} at the respective levels of x_{2}. x_{2} in this context is also called the “moderator” or “moderating variable” because it moderates the effect of x_{1}. You can see that x_{1} exhibits a positive effect on y when x_{2} is approximately smaller than 1, at which point the effect of x_{1} on y turns negative.
You can now visualize this, just as you did in the binary-continuous interaction above. Because x_{2}, the moderating variable, is now continuous, there is an (almost) infinite number of separate regression lines for x_{1} and y: each individual value of x_{2} creates a separate regression line (with a separate slope coefficient). The following scatterplot plots observations against their values on x_{1} and 7 and colors them based on their value of x_{2}, as in the previous example.
rbPal <- colorRampPalette(colors = c("red", "blue"))
sim.dat2$x2.col <- rbPal(10)[as.numeric(cut(sim.dat2$x2,breaks = 10))]
plot(x = sim.dat2$x1, y = sim.dat2$y, col = sim.dat2$x2.col,
pch = 19, xlab = "x1", ylab = "y")Then, I’m adding 10 regression lines for the 10 values of x_{2} that I just calculated. The lines are also colored according to the values of x_{2}.
eff.dat$x2.col <- rbPal(10)[as.numeric(cut(eff.dat$x2,breaks = 10))]
plot(x = sim.dat2$x1, y = sim.dat2$y, col = sim.dat2$x2.col,
pch = 19, xlab = "x1", ylab = "y")
apply(eff.dat, 1, function(x) abline(a = coef(mod.sim3)[1], b = x[2], col = x[3]));NULL
You can see that at low values of x_{2} (red), the relationship between x_{1} and y is positive (upward lines), whereas at higher values of x_{2} (blue), the relationship is negative (downward lines).
4 Why it is important to inspect your data first
As you’ve heard before, and as Hainmueller et al.’s working paper imply, you should always inspect your data. This also applies to interactions. Always investigate whether:
- Your data have observations across the whole range of each of the constitutive terms. It is best to do this by creating simple scatterplots of the form you see in Hainmueller et al.
- The raw relationship between one explanatory variable and the outcome variable is linear at different levels of the moderating variable. This is easy for binary moderators. For continuous moderators, Hainmueller et al. suggest splitting the observations in low, medium, and high levels of the continuous moderator.
You should see this as an opportunity to learn something about your data rather than a nuisance.
Here is an example of how I would inspect my data from the first model, which contained an interaction between a dichotomous (x_{1}) and continuous (x_{2}) variable:
par(mfrow = c(1, 2))
plot(x = sim.dat[sim.dat$x1 == 0, ]$x2,
y = sim.dat[sim.dat$x1 == 0, ]$y,
main = "x1 = 0", xlab = "x2", ylab = "y")
plot(x = sim.dat[sim.dat$x1 == 1, ]$x2,
y = sim.dat[sim.dat$x1 == 1, ]$y,
main = "x1 = 1", xlab = "x2", ylab = "y")par(mfrow = c(1, 1))I see no reason to be concerned about the distribution of x_{2} or the linearity of the relationship between x_{2} and y, and this makes sense, given how I simulated the data. But see Hainmueller et al. for examples where this does become a problem.
Next, here is an example of how I would inspect my data from the third model above, which contained an interaction between a continuous (x_{1}) and continuous (x_{2}) variable. First, I split the moderator (again, I choose x_{1}) in three groups. For this, I use the cut() function (see the R Cookbook for more info).
sim.dat2$x1_tri <- cut(x = sim.dat2$x1,
breaks = quantile(x = sim.dat2$x1, probs = c(0, 0.33, 0.66, 1)),
include.lowest = TRUE)
table(sim.dat2$x1_tri) %>% kbl() %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "center")| Var1 | Freq |
|---|---|
| [-9.55,-0.151] | 66 |
| (-0.151,3.66] | 66 |
| (3.66,18.2] | 68 |
par(mfrow = c(1, 3))
plot(x = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[1], ]$x2,
y = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[1], ]$y,
main = paste("x1 = ", levels(sim.dat2$x1_tri)[1], sep = ""),
xlab = "x2", ylab = "y")
plot(x = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[2], ]$x2,
y = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[2], ]$y,
main = paste("x1 = ", levels(sim.dat2$x1_tri)[2], sep = ""),
xlab = "x2", ylab = "y")
plot(x = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[3], ]$x2,
y = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[3], ]$y,
main = paste("x1 = ", levels(sim.dat2$x1_tri)[3], sep = ""),
xlab = "x2", ylab = "y")par(mfrow = c(1, 1))Again, given how I simulated these data, there is no reason to be concerned about the distribution of x_{2} or the linearity of the relationship between x_{2} and y.
5 Obtaining standard errors for marginal effects
As usual, you should evaluate the variance around coefficients. The online appendix to Brambor et al. (2006) gives you the formula for this variance:
For the variance of the marginal/conditional effect of x_{1}, use this equation:
\mathbb{V}ar=\mathbb{V}ar(\beta_{1})+x_{2}^{2} \times \mathbb{V}ar(\beta_{3})+2 \times x_{2} \times \mathbb{C}ov(\beta_{1},\beta_{3})
The standard error is then easily calculated as the square root of the variance.
Note that this equation makes use of the covariance of estimates. You learned about variance and covariance on Day 10, and you can access the variance-covariance matrix with the vcov() function. You can then extract cells of the matrix by using the familiar square brackets, where the first number stands for rows and the second for columns.
vcov(mod.sim3) %>% kbl %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "center")| (Intercept) | x1 | x2 | x1:x2 | |
|---|---|---|---|---|
| (Intercept) | 2.4277431 | -0.1831418 | -0.0009989 | -0.0018731 |
| x1 | -0.1831418 | 0.0935660 | -0.0018849 | 0.0007609 |
| x2 | -0.0009989 | -0.0018849 | 0.2835933 | -0.0200801 |
| x1:x2 | -0.0018731 | 0.0007609 | -0.0200801 | 0.0103373 |
Now, you can obtain the standard error of the respective coefficients by taking the square root of the variance. In this code, I’m nesting the equation above in the sqrt() function.
eff.dat$se.eff.x1 <- sqrt(vcov(mod.sim3)[2, 2] +
eff.dat$x2.sim^2 * vcov(mod.sim3)[4, 4] +
2 * eff.dat$x2.sim * vcov(mod.sim3)[2, 4])
eff.dat %>% kbl() %>%
kable_styling(bootstrap_options = "striped", full_width = F,
position = "center")| x2.sim | eff.x1 | x2.col | se.eff.x1 |
|---|---|---|---|
| -5 | 17.7837936 | #FF0000 | 0.5868481 |
| -4 | 14.7454971 | #FF0000 | 0.5028682 |
| -3 | 11.7072007 | #E2001C | 0.4266577 |
| -2 | 8.6689042 | #C60038 | 0.3631416 |
| -1 | 5.6306077 | #AA0055 | 0.3199713 |
| 0 | 2.5923113 | #8D0071 | 0.3058857 |
| 1 | -0.4459852 | #71008D | 0.3246924 |
| 2 | -3.4842817 | #5500AA | 0.3714283 |
| 3 | -6.5225781 | #3800C6 | 0.4372270 |
| 4 | -9.5608746 | #1C00E2 | 0.5148307 |
| 5 | -12.5991711 | #0000FF | 0.5996737 |
6 Plotting the marginal/conditional effect
It is usually (always, really) more useful to plot the marginal effect of a variable across the range of the second variable in the interaction, rather than plotting separate regression lines as I did above. I strongly recommend this approach. You already have the tools to do this:
plot(x = eff.dat$x2.sim, y = eff.dat$eff.x1, type = "l", xlab = "x2",
ylab = "Conditional effect of x1")
abline(h = 0, col = "grey", lty = "dashed")The solid line shows the conditional effect of x_{1} across the range of x_{2}. You can add 95\% confidence intervals by plotting lines representing the conditional effect \pm 1.96 \times SE.
plot(x = eff.dat$x2.sim, y = eff.dat$eff.x1, type = "l", xlab = "x2",
ylab = "Conditional effect of x1")
abline(h = 0, col = "grey", lty = "dashed")
lines(x = eff.dat$x2.sim, y = eff.dat$eff.x1 + 1.96 * eff.dat$se.eff.x1, lty = "dashed", col="#DD1E2F")
lines(x = eff.dat$x2.sim, y = eff.dat$eff.x1 - 1.96 * eff.dat$se.eff.x1, lty = "dashed", col="#DD1E2F")
abline(h = 0, lty = 2, col = "grey")7 Example 3: “Electoral Institutions, Unemployment and Extreme Right Parties: A Correction.”
The final example uses real-world data on the electoral success of extreme right-wing parties in 16 countries over 102 elections. These data were used in Matt Golder’s BJPS article “Electoral Institutions, Unemployment and Extreme Right Parties: A Correction.” (see Brambor et al. 2006 for the full citation).
ei.dat <- rio::import("../datasets/golder.2003.csv")
summary(ei.dat) %>% kbl() %>%
kable_styling(bootstrap_options = "striped", full_width = F,
position = "center")| country | year | unemp | enp | lerps | thresh | |
|---|---|---|---|---|---|---|
| Length:102 | Min. :1970 | Min. : 0.300 | Min. :1.72 | Min. :0.0000 | Min. : 0.670 | |
| Class :character | 1st Qu.:1975 | 1st Qu.: 2.125 | 1st Qu.:2.79 | 1st Qu.:0.0000 | 1st Qu.: 2.600 | |
| Mode :character | Median :1980 | Median : 5.300 | Median :3.40 | Median :0.4700 | Median : 5.000 | |
| NA | Mean :1980 | Mean : 5.811 | Mean :3.59 | Mean :0.9208 | Mean : 8.807 | |
| NA | 3rd Qu.:1985 | 3rd Qu.: 8.375 | 3rd Qu.:4.63 | 3rd Qu.:1.8284 | 3rd Qu.: 9.875 | |
| NA | Max. :1990 | Max. :20.800 | Max. :5.10 | Max. :3.3142 | Max. :35.000 |
par(mfrow = c(2, 2))
hist(ei.dat$lerps, main="Histogram of Log voteshare ERP", breaks=20, col="#336699")
hist(ei.dat$thresh, main="Histogram of Effective Electoral Threshold", breaks=20, col="#336699")
hist(ei.dat$enp, main="Histogram of Effective Number of parties", breaks=20, col="#336699")
hist(ei.dat$unemp, main="Histogram of Unemployment", breaks=20, col="#336699")par(mfrow = c(1, 1))| Variable | Description |
|---|---|
| lerps | Log of extreme right percentage support + 1 |
| thresh | Effective threshold of representation and exclusion in the political system |
| enp | Effective number of parties |
| unemp | Unemployment rate |
| country | Country name |
| year | Election year |
This study aims to determine the relationship between electoral support for extreme right-wing parties (the outcome variable), the threshold for representation in the political system (the first predictor), and the effective number of parties (the second predictor). The author hypothesized that the effect of representation thresholds might vary depending on the effective number of parties. The unemployment rate at the time of the election is a control variable. Because elections in the same country might be subject to idiosyncratic dynamics, the author includes a dummy variable for each country (cf. our discussion on Days 9 and 10).
After fitting the model, I display the results using the familiar screenreg() function from the “texreg” package. I make use of the omit.coef option to avoid printing of the coefficients for each country dummy variable.
m3 <- lm(lerps ~ thresh + enp + thresh * enp + unemp + factor(country), data = ei.dat)
stargazer::stargazer(m3, type='html', summary=TRUE,report = "vc*stp",ci=TRUE)| Dependent variable: | |
| lerps | |
| thresh | 0.267*** |
| (0.143, 0.391) | |
| t = 4.216 | |
| p = 0.0001 | |
| enp | 1.159*** |
| (0.311, 2.008) | |
| t = 2.677 | |
| p = 0.009 | |
| unemp | 0.070*** |
| (0.040, 0.100) | |
| t = 4.541 | |
| p = 0.00002 | |
| factor(country)belg | -3.513*** |
| (-5.354, -1.672) | |
| t = -3.740 | |
| p = 0.0004 | |
| factor(country)denm | -2.844*** |
| (-4.904, -0.785) | |
| t = -2.706 | |
| p = 0.009 | |
| factor(country)finl | -3.889*** |
| (-6.039, -1.740) | |
| t = -3.546 | |
| p = 0.001 | |
| factor(country)fran | -0.354 |
| (-2.124, 1.415) | |
| t = -0.393 | |
| p = 0.696 | |
| factor(country)germ | -2.280*** |
| (-3.048, -1.511) | |
| t = -5.816 | |
| p = 0.00000 | |
| factor(country)gree | -2.365*** |
| (-2.924, -1.806) | |
| t = -8.296 | |
| p = 0.000 | |
| factor(country)irel | -2.886*** |
| (-3.819, -1.953) | |
| t = -6.062 | |
| p = 0.00000 | |
| factor(country)ital | -1.600*** |
| (-2.685, -0.515) | |
| t = -2.891 | |
| p = 0.005 | |
| factor(country)neth | -4.452*** |
| (-6.268, -2.635) | |
| t = -4.803 | |
| p = 0.00001 | |
| factor(country)norw | -0.948* |
| (-1.997, 0.101) | |
| t = -1.771 | |
| p = 0.081 | |
| factor(country)port | -2.432*** |
| (-3.167, -1.697) | |
| t = -6.484 | |
| p = 0.000 | |
| factor(country)spai | -0.271 |
| (-1.035, 0.492) | |
| t = -0.696 | |
| p = 0.489 | |
| factor(country)swed | -2.827*** |
| (-3.752, -1.902) | |
| t = -5.993 | |
| p = 0.00000 | |
| factor(country)swit | -1.670 |
| (-3.914, 0.573) | |
| t = -1.459 | |
| p = 0.149 | |
| factor(country)uk | -3.796*** |
| (-5.296, -2.296) | |
| t = -4.960 | |
| p = 0.00001 | |
| thresh:enp | -0.099*** |
| (-0.140, -0.057) | |
| t = -4.667 | |
| p = 0.00002 | |
| Constant | -0.967 |
| (-3.120, 1.187) | |
| t = -0.880 | |
| p = 0.382 | |
| Observations | 102 |
| R2 | 0.853 |
| Adjusted R2 | 0.819 |
| Residual Std. Error | 0.428 (df = 82) |
| F Statistic | 25.009*** (df = 19; 82) |
| Note: | p<0.1; p<0.05; p<0.01 |
Rather than trying to interpret these results from the regression table, I calculate the marginal effect of electoral thresholds across the range of the effective number of parties. This step is exactly analogous to what I did with simulated data above.
thresh_sim <- seq(from = min(ei.dat$thresh), to = max(ei.dat$thresh), length.out = 50)
enp_sim <- seq(from = min(ei.dat$enp), to = max(ei.dat$enp), length.out = 50)
eff_thresh <- coef(m3)[2] + coef(m3)[20] * enp_sim
eff_thresh_se <- sqrt(vcov(m3)[2, 2] + enp_sim^2 * vcov(m3)[20, 20] + 2 * enp_sim * vcov(m3)[2, 20])
plot(x = enp_sim, y = eff_thresh, type = "l",
xlab = "Effective number of parties",
ylab = "Conditional effect of the electoral threshold")
lines(x = enp_sim, y = eff_thresh + 1.96 * eff_thresh_se, lty = "dashed", col="#DD1E2F")
lines(x = enp_sim, y = eff_thresh - 1.96 * eff_thresh_se, lty = "dashed", col="#DD1E2F")
abline(h = 0, lty = "dashed", col="#336699")What do you conclude from this figure? What do these results say about the author’s hypothesis?
8 Canned R functions to plot marginal effects in OLS
Rather than calculating marginal effects by hand, you might want to use R functions to plot marginal effects with less effort.
8.1 DAintfun2()
One good option is Dave Armstrong’s DAintfun2() function, which is part of his “DAMisc” package. You need to first install the package, then load it, and then specify the function with at least the following arguments:
# install.packages("DAMisc")
library(DAMisc)
DAintfun2(m3, varnames = c("thresh", "enp"),
rug = TRUE, hist = TRUE)DAintfun2() by default plots both the marginal effect of the first variable as well as that of the second variable. It also allows you to include an indicator of the actual distribution of the variables, via a histogram or a rug plot. With these supplementary plots, you can immediately see whether the estimated effect of a variable corresponds to real observations at that value of the moderating variable.
set.seed(123)
n.sample <- 200
x1 <- rbinom(n.sample, size = 1, prob = 0.5)
x2 <- runif(n.sample, -5, 5)
a <- 5
b1 <- 3
b2 <- 4
b3 <- -3
e <- rnorm(n.sample, 0, 5)
y <- a + b1 * x1 + b2 * x2 + e
sim.dat <- data.frame(y, x1, x2)
head(sim.dat) y x1 x2
1 5.543093 0 -2.6127397
2 33.056422 1 4.6235894
3 7.728904 0 1.0136573
4 11.317159 1 0.1502973
5 2.031234 1 -0.9742666
6 17.828627 0 3.8024654